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METHOD AND APPARATUS FOR PREDICTTVELY 
QUANTIZING VOICED SPEECH 

BACKGROUND OF THE INVENTION 

5 

L Field of the Invention 

The present invention pertains generally to the field of speech 
processing, and more specifically to methods and apparatus for predictively 
quantizing voiced speech. 

10 IL Background 

Transmission of voice by digital techniques has become widespread, 
particularly in long distance and digital radio telephone applications. This, 
in turn, has created interest in determining the least amount of information 
that can be sent over a channel while maintaining the perceived quality of 

15 the reconstructed speech. If speech is transmitted by simply sampling and 
digitizing, a data rate on the order of sixty-four kilobits per second (kbps) is 
required to achieve a speech quality of conventional analog telephone. 
However, through the use of speech analysis, followed by the appropriate 
coding, transmission, and resynthesis at the receiver, a significant reduction 

20 in the data rate can be achieved. 

Devices for compressing speech find use in many fields of 
telecommunications. An exemplary field is wireless communications. The 
field of wireless communications has many applications including, e.g., 
cordless telephones, paging, wireless local loops, wireless telephony such as 

25 cellular and PCS telephone systems, mobile Internet Protocol (IP) telephony, 
and satellite communication systems. A particularly important application 
is wireless telephony for mobile subscribers. 

Various over-the-air interfaces have been developed for wireless 
communication systems including, e.g., frequency division multiple access 

30 (FDMA), time division multiple access (TDMA), and code division multiple 
access (CDMA). In connection therewith, various domestic and 
international standards have been established including, e.g., Advanced 
Mobile Phone Service (AMPS), Global System for Mobile Communications 
(GSM), and Interim Standard 95 (IS-95). An exemplary wireless telephony 

35 communication system is a code division multiple access (CDMA) system. 
The IS-95 standard and its derivatives, IS-95A, ANSI J-STD-008, IS-95B, 
proposed third generation standards IS-95C and IS-2000, etc. (referred to 
collectively herein as IS-95), are promulgated by the Telecommunication 
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Industry Association (TIA) and other well known standards bodies to specify 
the use of a CDMA over-the-air interface for cellular or PCS telephony 
communication systems. Exemplary wireless communication systems 
configured substantially in accordance with the use of the IS-95 standard are 
5 described in U.S. Patent Nos. 5,103,459 and 4,901,307, which are assigned to 
the assignee of the present invention and fully incorporated herein by 
reference. 

Devices that employ techniques to compress speech by extracting 
parameters that relate to a model of human speech generation are called 

10 speech coders. A speech coder divides the incoming speech signal into 
blocks of time, or analysis frames. Speech coders typically comprise an 
encoder and a decoder. The encoder analyzes the incoming speech frame to 
extract certain relevant parameters, and then quantizes the parameters into 
binary representation, i.e., to a set of bits or a binary data packet. The data 

15 packets are transmitted over the communication channel to a receiver and a 
decoder. The decoder processes the data packets, unquantizes them to 
produce the parameters, and resynthesizes the speech frames using the 
unquantized parameters. 

The function of the speech coder is to compress the digitized speech 

20 signal into a low-bit-rate signal by removing all of the natural redundancies 
inherent in speech. The digital compression is achieved by representing the 
input speech frame with a set of parameters and employing quantization to 
represent the parameters with a set of bits. If the input speech frame has a 
number of bits Nj and the data packet produced by the speech coder has a 

25 number of bits N G , the compression factor achieved by the speech coder is C r 
= Nj/N 0 . The challenge is to retain high voice quality of the decoded speech 
while achieving the target compression factor. The performance of a speech 
coder depends on (1) how well the speech model, or the combination of the 
analysis and synthesis process described above, performs, and (2) how well 

30 the parameter quantization process is performed at the target bit rate of N Q 
bits per frame. The goal of the speech model is thus to capture the essence of 
the speech signal, or the target voice quality, with a small set of parameters 
for each frame. 

Perhaps most important in the design of a speech coder is the search 
35 for a good set of parameters (including vectors) to describe the speech signal. 
A good set of parameters requires a low system bandwidth for the 
reconstruction of a perceptually accurate speech signal. Pitch, signal power, 
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spectral envelope (or formants), amplitude spectra, and phase spectra are 
examples of the speech coding parameters. 

Speech coders may be implemented as time-domain coders, which 
attempt to capture the time-domain speech waveform by employing high 
5 time-resolution processing to encode small segments of speech (typically 5 
millisecond (ms) subframes) at a time. For each subframe, a high-precision 
representative from a codebook space is found by means of various search 
algorithms known in the art. Alternatively, speech coders may be 
implemented as frequency-domain coders, which attempt to capture the 

10 short-term speech spectrum of the input speech frame with a set of 
parameters (analysis) and employ a corresponding synthesis process to 
recreate the speech waveform from the spectral parameters. The parameter 
quantizer preserves the parameters by representing them with stored 
representations of code vectors in accordance with known quantization 

15 techniques described in A. Gersho & R.M. Gray, Vector Quantization and 
Signal Compression (1992). 

A well-known time-domain speech coder is the Code Excited Linear 
Predictive (CELP) coder described in L.B. Rabiner & R.W. Schafer, Digital 
Processing of Speech Signals 396-453 (1978), which is fully incorporated 

20 herein by reference. In a CELP coder, the short term correlations, or 
redundancies, in the speech signal are removed by a linear prediction (LP) 
analysis, which finds the coefficients of a short-term formant filter. 
Applying the short-term prediction filter to the incoming speech frame 
generates an LP residue signal, which is further modeled and quantized with 

25 long-term prediction filter parameters and a subsequent stochastic codebook. 
Thus, CELP coding divides the task of encoding the time-domain speech 
waveform into the separate tasks of encoding the LP short-term filter 
coefficients and encoding the LP residue. Time-domain coding can be 
performed at a fixed rate (i.e., using the same number of bits, N 0/ for each 

30 frame) or at a variable rate (in which different bit rates are used for different 
types of frame contents). Variable-rate coders attempt to use only the 
amount of bits needed to encode the codec parameters to a level adequate to 
obtain a target quality. An exemplary variable rate CELP coder is described in 
U.S. Patent No. 5,414,796, which is assigned to the assignee of the present 

35 invention and fully incorporated herein by reference. 

Time-domain coders such as the CELP coder typically rely upon a high 
number of bits, N 0 , per frame to preserve the accuracy of the time-domain 
speech waveform. Such coders typically deliver excellent voice quality 
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provided the number of bits, N 0 , per frame relatively large (e.g., 8 kbps or 
above). However, at low bit rates (4 kbps and below), time-domain coders fail 
to retain high quality and robust performance due to the limited number of 
available bits. At low bit rates, the limited codebook space clips the 
5 waveform-matching capability of conventional time-domain coders, which 
are so successfully deployed in higher-rate commercial applications. Hence, 
despite improvements over time, many CELP coding systems operating at 
low bit rates suffer from perceptually significant distortion typically 
characterized as noise. 

10 There is presently a surge of research interest and strong commercial 

need to develop a high-quality speech coder operating at medium to low bit 
rates (i.e., in the range of 2.4 to 4 kbps and below). The application areas 
include wireless telephony, satellite communications, Internet telephony, 
various multimedia and voice-streaming applications, voice mail, and other 

15 voice storage systems. The driving forces are the need for high capacity and 
the demand for robust performance under packet loss situations. Various 
recent speech coding standardization efforts are another direct driving force 
propelling research and development of low-rate speech coding algorithms. 
A . low-rate speech coder creates more channels, or users, per allowable 

20 application bandwidth, and a low-rate speech coder coupled with an 
additional layer of suitable channel coding can fit the overall bit-budget of 
coder specifications and deliver a robust performance under channel error 
conditions. 

One effective technique to encode speech efficiently at low bit rates is 
25 multimode coding. An exemplary multimode coding technique is described 
in U.S. Application Serial No. 09/217,341, entitled VARIABLE RATE 
SPEECH CODING, filed December 21, 1998, assigned to the assignee of the 
present invention, and fully incorporated herein by reference. 
Conventional multimode coders apply different modes, or encoding- 
30 decoding algorithms, to different types of input speech frames. Each mode, 
or encoding-decoding process, is customized to optimally represent a certain 
type of speech segment, such as, e.g., voiced speech, unvoiced speech, 
transition speech (e.g., between voiced and unvoiced), and background noise 
(silence, or nonspeech) in the most efficient manner. An external, open- 
35 loop mode decision mechanism examines the input speech frame and 
makes a decision regarding which mode to apply to the frame. The open- 
loop mode decision is typically performed by extracting a number of 
parameters from the input frame, evaluating the parameters as to certain 
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temporal and spectral characteristics, and basing a mode decision upon the 
evaluation- 
Coding systems that operate at rates on the order of 2.4 kbps are 
generally parametric in nature. That is, such coding systems operate by 
5 transmitting parameters describing the pitch-period and the spectral 
envelope (or formants) of the speech signal at regular intervals. Illustrative 
of these so-called parametric coders is the LP vocoder system. 

LP vocoders model a voiced speech signal with a single pulse per pitch 
period. This basic technique may be augmented to include transmission 

10 information about the spectral envelope, among other things. Although LP 
vocoders provide reasonable performance generally, they may- introduce 
perceptually significant distortion, typically characterized as buzz. 

In recent years, coders have emerged that are hybrids of both 
waveform coders and parametric coders. Illustrative of these so-called 

15 hybrid coders is the prototype-waveform interpolation (PWI) speech coding 
system. The PWI coding system may also be known as a prototype pitch 
period (PPP) speech coder. A PWI coding system provides an efficient 
method for coding voiced speech. The basic concept of PWI is to extract a 
representative pitch cycle (the prototype waveform) at fixed intervals, to 

20 transmit its description, and to reconstruct the speech signal by interpolating 
between the prototype waveforms. The PWI method may operate either on 
the LP residual signal or on the speech signal. An exemplary PWI, or PPP, 
speech coder is described in U.S. Application Serial No. 09/217,494, entitled 
PERIODIC SPEECH CODING, filed December 21, 1998, assigned to the 

25 assignee of the present invention, and fully incorporated herein by 
reference. Other PWI, or PPP, speech coders are described in U.S. Patent No. 
5,884,253 and W. Bastiaan Kleijn & Wolfgang Granzow Methods for 
Waveform Interpolation in Speech Coding, in 1 Digital Signal Processing 
215-230 (1991). 

30 In most conventional speech coders, the parameters of a given pitch 

prototype, or of a given frame, are each individually quantized and 
transmitted by the encoder. In addition, a difference value is transmitted for 
each parameter. The difference value specifies the difference between the 
parameter value for the current frame or prototype and the parameter value 

35 for the previous frame or prototype. However, quantizing the parameter 
values and the difference values requires using bits (and hence bandwidth). 
In a low-bit-rate speech coder, it is advantageous to transmit the least 
number of bits possible to maintain satisfactory voice quality. For this 
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reason, in conventional low-bit-rate speech coders, only the absolute 
parameter values are quantized and transmitted. It would be desirable to 
decrease the number of bits transmitted without decreasing the 
informational value. Thus, there is a need for a predictive scheme for 
5 quantizing voiced speech that decreases the bit rate of a speech coder. 

SUMMARY OF THE INVENTION 

The present invention is directed to a predictive scheme for 
quantizing voiced speech that decreases the bit rate of a speech coder. 

10 Accordingly, in one aspect of the invention, a method of quantizing 
information- -about a parameter of speech is provided. The method 
advantagebusly includes generating at least one weighted value of the 
parameter for at least one previously processed frame of speech, wherein the 
sum of all weights used is one; subtracting the at least one weighted value 

15 from a value of the parameter for a currently processed frame of speech to 
yield a difference value; and quantizing the difference value. 

In another aspect of the invention, a speech coder configured to 
quantize information about a parameter of speech is provided. The speech 
coder advantageously includes means for generating at least one weighted 

20 value of the parameter for at least one previously processed frame of speech, 
wherein the sum of all weights used is one; means for subtracting the at least 
one weighted value from a value of the parameter for a currently processed 
frame of speech to yield a difference value; and means for quantizing the 
difference value. 

25 In another aspect of the invention, an infrastructure element 

configured to quantize information about a parameter of speech is provided. 
The infrastructure element advantageously includes a parameter generator 
configured to generate at least one weighted value of the parameter for at 
least one previously processed frame of speech, wherein the sum of all 

30 weights used is one; and a quantizer coupled to the parameter generator and 
configured to subtract the at least one weighted value from a value of the 
parameter for a currently processed frame of speech to yield a difference 
value, and to quantize the difference value. 

In another aspect of the invention, a subscriber unit configured to 

35 quantize information about a parameter of speech is provided. The 
subscriber unit advantageously includes a processor; and a storage medium 
coupled to the processor and containing a set of instructions executable by 
the processor to generate at least one weighted value of the parameter for at 
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least one previously processed frame of speech, wherein the sum of all 
weights used is one, and subtract the at least one weighted value from a 
value of the parameter for a currently processed frame of speech to yield a 
difference value, and to quantize the difference value. 
5 In another aspect of the invention, a method of quantizing 

information about a phase parameter of speech is provided. The method 
advantageously includes generating at least one modified value of the phase 
parameter for at least one previously processed frame of speech; applying a 
number of phase shifts to the at least one modified value, the number of 
10 phase shifts being greater than or equal to zero; subtracting the at least one 
modified value from a value of the phase parameter for a currently 
processed frame of speech to yield a difference value; and quantizing the 
difference value. 

In another aspect of the invention, a speech coder configured to 

15 quantize information about a phase parameter of speech is provided. The 
speech coder advantageously includes means for generating at least one 
modified value of the phase parameter for at least one previously processed 
frame of speech; means for applying a number of phase shifts to the at least 
one modified value, the number of phase shifts being greater than or equal 

20 to zero; means for subtracting the at least one modified value from a value 
of the phase parameter for a currently processed frame of speech to yield a 
difference value; and means for quantizing the difference value. 

In another aspect of the invention, a subscribed unit configured to 
quantize information about a phase parameter of speech is provided. The 

25 subscriber unit advantageously includes a processor; and a storage medium 
coupled to the processor and containing a set of instructions executable by 
the processor to generate at least one modified value of the phase parameter 
for at least one previously processed frame of speech, apply a number of 
phase shifts to the at least one modified value, the number of phase shifts 

30 being greater than or equal to zero, subtract the at least one modified value 
from a value of the parameter for a currently processed frame of speech to 
yield a difference value, and to quantize the difference value. 

BRIEF DESCRIPTION OF THE DRAWINGS 

35 FIG. 1 is a block diagram of a wireless telephone system. 

FIG. 2 is a block diagram of a communication channel terminated at 
each end by speech coders. 

FIG. 3 is a block diagram of a speech encoder. 
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FIG. 4 is a block diagram of a speech decoder. 

FIG. 5 is a block diagram of a speech coder including 
encoder/transmitter and decoder/receiver portions. 

FIG. 6 is a graph of signal amplitude versus time for a segment of 
5 voiced speech. 

FIG. 7 is a block diagram of a quantizer that can be used in a speech 
encoder. 

FIG. 8 is a block diagram of a processor coupled to a storage medium. 

10 DETAILED DESCRIPTION OF THE PREFERRED 

EMBODIMENTS 

The exemplary embodiments described hereinbelow reside in a 
wireless telephony communication system configured to employ a CDMA 
over-the-air interface. Nevertheless, it would be understood by those skilled 

15 in the art that a method and apparatus for predictively coding voiced speech 
embodying features of the instant invention may reside in any of various 
communication systems employing a wide range of technologies known to 
those of skill in the art. 

As illustrated in FIG. 1, a CDMA wireless telephone system generally 

20 includes a plurality of mobile subscriber units 10, a plurality of base stations 
12, base station controllers (BSCs) 14, and a mobile switching center (MSC) 
16. The MSC 16 is configured to interface with a conventional public switch 
telephone network (PSTN) 18. The MSC 16 is also configured to interface 
with the BSCs 14. The BSCs 14 are coupled to the base stations 12 via 

25 backhaul lines. The backhaul lines may be configured to support any of 
several known interfaces including, e.g., El/Tl, ATM, IP, PPP, Frame Relay, 
HDSL, ADSL, or xDSL. It is understood that there may be more than two 
BSCs 14 in the system. Each base station 12 advantageously includes at least 
one sector (not shown), each sector comprising an omnidirectional antenna 

30 or an antenna pointed in a particular direction radially away from the base 
station 12. Alternatively, each sector may comprise two antennas for 
diversity reception. Each base station 12 may advantageously be designed to 
support a plurality of frequency assignments. The intersection of a sector 
and a frequency assignment may be referred to as a CDMA channel. The 

35 base stations 12 may also be known as base station transceiver subsystems 
(BTSs) 12. Alternatively, "base station" may be used in the industry to refer 
collectively to a BSC 14 and one or more BTSs 12. The BTSs 12 may also be 
denoted "cell sites" 12. Alternatively, individual sectors of a given BTS 12 
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may be referred to as cell sites. The mobile subscriber units 10 are typically 
cellular or PCS telephones 10. The system is advantageously configured for 
use in accordance with the IS-95 standard. 

During typical operation of the cellular telephone system, the base 
5 stations 12 receive sets of reverse link signals from sets of mobile units 10. 
The mobile units 10 are conducting telephone calls or other 
communications. Each reverse link signal received by a given base station 
12 is processed within that base station 12. The resulting data is forwarded to 
the BSCs 14. The BSCs 14 provides call resource allocation and mobility 

10 management functionality including the orchestration of soft handoffs 
between base stations 12. The BSCs 14 also routes the received data to the 
MSC 16, which provides additional routing services for interface with the 
PSTN 18. Similarly, the PSTN 18 interfaces with the MSC 16, and the MSC 
16 interfaces with the BSCs 14, which in turn control the base stations 12 to 

15 transmit sets of forward link signals to sets of mobile units 10. It should be 
understood by those of skill that the subscriber units 10 may be fixed units in 
alternate embodiments. 

In FIG. 2 a first encoder 100 receives digitized speech samples s(n) and 
. encodes the samples s(n) for transmission on a transmission medium 102, or 

20 communication channel 102, to a first decoder 104. The decoder 104 decodes 
the encoded speech samples and synthesizes an output speech signal 
s sYNTH( n )- For transmission in the opposite direction, a second encoder 106 
encodes digitized speech samples s(n), which are transmitted on a 
communication channel 108. A second decoder 110 receives and decodes the 

25 encoded speech samples, generating a synthesized output speech signal 

S SYNTn( n )- 

The speech samples s(n) represent speech signals that have been 
digitized and quantized in accordance with any of various methods known 
in the art including, e.g., pulse code modulation (PCM), companded jx-law, 

30 or A-law. As known in the art, the speech samples s(n) are organized into 
frames of input data wherein each frame comprises a predetermined 
number of digitized speech samples s(n). In an exemplary embodiment, a 
sampling rate of 8 kHz is employed, with each 20 ms frame comprising 160 
samples. In the embodiments described below, the rate of data transmission 

35 may advantageously be varied on a frame-by-frame basis from full rate to 
(half rate to quarter rate to eighth rate. Varying the data transmission rate is 
advantageous because lower bit rates may be selectively employed for frames 
containing relatively less speech information. As understood by those 
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skilled in the art, other sampling rates and/ or frame sizes may be used. Also 
in the embodiments described below, the speech encoding (or coding) mode 
may be varied on a frame-by-frame basis in response to the speech 
information or energy of the frame. 
5 The first encoder 100 and the second decoder 110 together comprise a 

first speech coder (encoder/ decoder), or speech codec. The speech coder 
could be used in any communication device for transmitting speech signals, 
including, e.g., the subscriber units, BTSs, or BSCs described above with 
reference to FIG. 1. Similarly, the second encoder 106 and the first decoder 

10 104 together comprise a second speech coder. It is understood by those of 
skill in the art that speech coders may be implemented with a digital signal 
processor (DSP), an application-specific integrated circuit (ASIC), discrete gate 
logic, firmware, or any conventional programmable software module and a 
microprocessor. The software module could reside in RAM memory, flash 

15 memory, registers, or any other form of storage medium known in the art. 
Alternatively, any conventional processor, controller, or state machine 
could be substituted for the microprocessor. Exemplary ASICs designed 
specifically for speech coding are described in U.S. Patent No. 5,727,123, 
assigned to the assignee of the present invention and fully incorporated 

20 herein by reference, and U.S. Application Serial No. 08/197,417, entitled 
VOCODER ASIC, filed February 16, 1994, assigned to the assignee of the 
present invention, and fully incorporated herein by reference. 

In FIG. 3 an encoder 200 that may be used in a speech coder includes a 
mode decision module 202, a pitch estimation module 204, an LP analysis 

25 module 206, an LP analysis filter 208, an LP quantization module 210, and a 
residue quantization module 212. Input speech frames s(n) are provided to 
the mode decision module 202, the pitch estimation module 204, the LP 
analysis module 206, and the LP analysis filter 208. The mode decision 
module 202 produces a mode index I M and a mode M based upon the 

30 periodicity, energy, signal-to-noise ratio (SNR), or zero crossing rate, among 
other features, of each input speech frame s(n). Various methods of 
classifying speech frames according to periodicity are described in U.S. Patent 
No. 5,911,128, which is assigned to the assignee of the present invention and 
fully incorporated herein by reference. Such methods are also incorporated 

35 into the Telecommunication Industry Association Interim Standards 
TTA/EIA IS-127 and TIA/EIA IS-733. An exemplary mode decision scheme is 
also described in the aforementioned U.S. Application Serial No. 09/217,341. 
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The pitch estimation module 204 produces a pitch index I P and a lag 
value P 0 based upon each input speech frame s(n). The LP analysis module 
206 performs linear predictive analysis on each input speech frame s(n) to 
generate an LP parameter a. The LP parameter a is provided to the LP 
5 quantization module 210. The LP quantization module 210 also receives the 
mode M, thereby performing the quantization process in a mode-dependent 
manner. The LP quantization module 210 produces an LP index 1^, and a 
quantized LP parameter K The LP analysis filter 208 receives the quantized 
LP parameter & in addition to the input speech frame s(n). The LP analysis 

10 filter 208 generates an LP residue signal R[n], which represents the error 
between the input speech frames s(n) and the reconstructed speech based on 
the quantized linear predicted parameters «. The LP residue R[n], the mode 
M, and the quantized LP parameter 5 are provided to the residue 
quantization module 212. Based upon these values, the residue 

15 quantization module 212 produces a residue index I R and a quantized residue 
signal R [ n ] . 

In FIG. 4 a decoder 300 that may be. used in a speech coder includes an 
LP parameter decoding module 302, a residue decoding module 304, a mode 
decoding module 306, and an LP synthesis filter 308. The mode decoding 

20 module 306 receives and decodes a mode index I M , generating therefrom a 
mode M. The LP parameter decoding module 302 receives the mode M and 
an LP index 1^. The LP parameter decoding module 302 decodes the received 
values to produce a quantized LP parameter K The residue decoding 
module 304 receives a residue index I R , a pitch index I P , and the mode index 

25 I M . The residue decoding module 304 decodes the received values to 
generate a quantized residue signal R W . The quantized residue signal 
and the quantized LP parameter 3 are provided to the LP synthesis filter 308, 
which synthesizes a decoded output speech signal 5 M therefrom. 

Operation and implementation of the various modules of the encoder 

30 200 of FIG. 3 and the decoder 300 of FIG. 4 are known in the art and described 
in the aforementioned U.S. Patent No. 5,414,796 and L.B. Rabiner & R.W. 
Schafer, Digital Processing of Speech Signals 396-453 (1978). 

In one embodiment a multimode speech encoder 400 communicates 
with a multimode speech decoder 402 across a communication channel, or 

35 transmission medium, 404. The communication channel 404 is 
advantageously an RF interface configured in accordance with the IS-95 
standard. It would be understood by those of skill in the art that the encoder 
400 has an associated decoder (not shown). The encoder 400 and its 
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associated decoder together form a first speech coder. It would also be 
understood by those of skill in the art that the decoder 402 has an associated 
encoder (not shown). The decoder 402 and its associated encoder together 
form a second speech coder. The first and second speech coders may 
5 advantageously be implemented as part of first and second DSPs, and may 
reside in, e.g., a subscriber unit and a base station in a PCS or cellular 
telephone system, or in a subscriber unit and a gateway in a satellite system. 

The encoder 400 includes a parameter calculator 406, a mode 
classification module 408, a plurality of encoding modes 410, and a packet 

10 formatting module 412. The number of encoding modes 410 is shown as n, 
which one of skill would understand could signify any reasonable number 
of encoding modes 410. For simplicity, only three encoding modes 410 are 
shown, with a dotted line indicating the existence of other encoding modes 
410. The decoder 402 includes a packet disassembler and packet loss detector 

15 module 414, a plurality of decoding modes 416, an erasure decoder 418, and a 
post filter, or speech synthesizer, 420. The number of decoding modes 416 is 
shown as n, which one of skill would understand could signify any 
reasonable number of decoding modes 416. For simplicity, only three 
decoding modes 416 are shown, with a dotted line indicating the existence of 

20 other decoding modes 416. 

A speech signal, s(n), is provided to the parameter calculator 406. The 
speech signal is divided into blocks of samples called frames. The value n 
designates the frame number. In an alternate embodiment, a linear 
prediction (LP) residual error signal is used in place of the speech signal. The 

25 LP residue is used by speech coders such as, e.g., the CELP coder. 
Computation of the LP residue is advantageously performed by providing 
the speech signal to an inverse LP filter (not shown). The transfer function 
of the inverse LP filter, A(z), is computed in accordance with the following 
equation: 

30 

A(z) = 1 - ap 1 - a 2 z 2 - ... -afi*, 

in which the coefficients a x are filter taps having predefined values chosen in 
accordance with known methods, as described in the aforementioned U.S. 
35 Patent No. 5,414,796 and U.S. Application Serial No. 09/217,494. The 
number p indicates the number of previous samples the inverse LP filter 
uses for prediction purposes. In a particular embodiment, p is set to ten. 
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The parameter calculator 406 derives various parameters based on the 
current frame. In one embodiment these parameters include at least one of 
the following: linear predictive coding (LPC) filter coefficients, line spectral 
pair (LSP) coefficients, normalized autocorrelation functions (NACFs), open- 
5 loop lag, zero crossing rates, band energies, and the formant residual signal. 
Computation of LPC coefficients, LSP coefficients, open-loop lag, band 
energies, and the formant residual signal is described in detail in the 
aforementioned U.S. Patent No. 5,414,796. Computation of NACFs and zero 
crossing rates is described in detail in the aforementioned U.S. Patent No. 
10 5,911,128. 

The parameter calculator 406 is coupled to the mode classification 
module 408. The parameter calculator 406 provides the parameters to the 
mode classification module 408. The mode classification module 408 is 
coupled to dynamically switch between the encoding modes 410 on a frame- 

15 by-frame basis in order to select the most appropriate encoding mode 410 for 
the current frame. The mode classification module 408 selects a particular 
encoding mode 410 for the current frame by comparing the parameters with 
predefined threshold and/or ceiling values. Based upon the energy content 
of the frame, the mode classification module 408 classifies the frame as 

20 nonspeech, or inactive speech (e.g., silence, background noise, or pauses 
between words), or speech. Based upon the periodicity of the frame, the 
mode classification module 408 then classifies speech frames as a particular 
type of speech, e.g., voiced, unvoiced, or transient. 

Voiced speech is speech that exhibits a relatively high degree of 

25 periodicity. A segment of voiced speech is shown in the graph of FIG. 6. As 
illustrated, the pitch period is a component of a speech frame that may be 
used to advantage to analyze and reconstruct the contents of the frame. 
Unvoiced speech typically comprises consonant sounds. Transient speech 
frames are typically transitions between voiced and unvoiced speech. 

30 Frames that are classified as neither voiced nor unvoiced speech are 
classified as transient speech. It would be understood by those skilled in the 
art that any reasonable classification scheme could be employed. 

Classifying the speech frames is advantageous because different 
encoding modes 410 can be used to encode different types of speech, resulting 

35 in more efficient use of bandwidth in a shared channel such as the 
communication channel 404. For example, as voiced speech is periodic and 
thus highly predictive, a low-bit-rate, highly predictive encoding mode 410 
can be employed to encode voiced speech. Classification modules such as 
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the classification module 408 are described in detail in the aforementioned 
U.S. Application Serial No. 09/217,341 and in U.S. Application Serial No. 
09/259,151 entitled CLOSED-LOOP MULTIMODE MIXED-DOMAIN LINEAR 
PREDICTION (MDLP) SPEECH CODER, filed February 26, 1999, assigned to 
5 the assignee of the present invention, and fully incorporated herein by 
reference. 

The mode classification module 408 selects an encoding mode 410 for 
the current frame based upon the classification of the frame. The various 
encoding modes 410 are coupled in parallel. One or more of the encoding 

10 modes 410 may be operational at any given time. Nevertheless, only one 
encoding mode 410 advantageously operates at any given time, and is 
selected according to the classification of the current frame. 

The different encoding modes 410 advantageously operate according 
to different coding bit rates, different coding schemes, or different 

15 combinations of coding bit rate and coding scheme. The various coding rates 
used may be full rate, half rate, quarter rate, and/or eighth rate. The various 
coding schemes used may be CELP coding, prototype pitch period (PPP) 
coding (or waveform interpolation (WI) coding), and/or noise excited linear 
prediction (NELP) coding. Thus, for example, a particular encoding mode 

20 410 could be full rate CELP, another encoding mode 410 could be half rate 
CELP, another encoding mode 410 could be quarter rate PPP, and another 
encoding mode 410 could be NELP. 

In accordance with a CELP encoding mode 410, a linear predictive 
vocal tract model is excited with a quantized version of the LP residual 

25 signal. The quantized parameters for the entire previous frame are used to 
reconstruct the current frame. The CELP encoding mode 410 thus provides 
for relatively accurate reproduction of speech but at the cost of a relatively 
high coding bit rate. The CELP encoding mode 410 may advantageously be 
used to encode frames classified as transient speech. An exemplary variable 

30 rate CELP speech coder is described in detail in the aforementioned U.S. 
Patent No. 5,414,796. 

In accordance with a NELP encoding mode 410, a filtered, pseudo- 
random noise signal is used to model the speech frame. The NELP encoding 
mode 410 is a relatively simple technique that achieves a low bit rate. The 

35 NELP encoding mode 412 may be used to advantage to encode frames 
classified as unvoiced speech. An exemplary NELP encoding mode is 
described in detail in the aforementioned U.S. Application Serial No. 
09/217,494. 
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In accordance with a PPP encoding mode 410, only a subset of the pitch 
periods within each frame are encoded. The remaining periods of the 
speech signal are reconstructed by interpolating between these prototype 
periods. In a time-domain implementation of PPP coding, a first set of 
5 parameters is calculated that describes how to modify a previous prototype 
period to approximate the current prototype period. One or more 
codevectors are selected which, when summed, approximate the difference 
between the current prototype period and the modified previous prototype 
period. A second set of parameters describes these selected codevectors. In a 

10 frequency-domain implementation of PPP coding, a set of parameters is 
calculated to describe amplitude and phase spectra of the prototype. This 
may be done either in an absolute sense, or predictively as described 
hereinbelow. In either implementation of PPP coding, the decoder 
synthesizes an output speech signal by reconstructing a current prototype 

15 based upon the first and second sets of parameters. The speech signal is then 
interpolated over the region between the current reconstructed prototype 
period and a previous reconstructed prototype period. The prototype is thus 
a portion of the current frame that will be linearly interpolated with 
prototypes from previous frames that were similarly positioned within the 

20 frame in order to reconstruct the speech signal or the LP residual signal at 
the decoder (i.e., a past prototype period is used as a predictor of the current 
prototype period). An exemplary PPP speech coder is described in detail in 
the aforementioned U.S. Application Serial No. 09/217,494. 

Coding the prototype period rather than the entire speech frame 

25 reduces the required coding bit rate. Frames classified as voiced speech may 
advantageously be coded with a PPP encoding mode 410. As illustrated in 
FIG. 6, voiced speech contains slowly time-varying, periodic components 
that are exploited to advantage by the PPP encoding mode 410. By exploiting 
the periodicity of the voiced speech, the PPP encoding mode 410 is able to 

30 achieve a lower bit rate than the CELP encoding mode 410. 

The selected encoding mode 410 is coupled to the packet formatting 
module 412. The selected encoding mode 410 encodes, or quantizes, the 
current frame and provides the quantized frame parameters to the packet 
formatting module 412. The packet formatting module 412 advantageously 

35 assembles the quantized information into packets for transmission over the 
communication channel 404. In one embodiment the packet formatting 
module 412 is configured to provide error correction coding and format the 
packet in accordance with the IS-95 standard. The packet is provided to a 
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transmitter (not shown), converted to analog format, modulated, and 
transmitted over the communication channel 404 to a receiver (also not 
shown), which receives, demodulates, and digitizes the packet, and provides 
the packet to the decoder 402. 
5 In the decoder 402, the packet disassember and packet loss detector 

module 414 receives the packet from the receiver. The packet disassembler 
and packet loss detector module 414 is coupled to dynamically switch 
between the decoding modes 416 on a packet-by-packet basis. The number of 
decoding modes 416 is the same as the number of encoding modes 410, and 

10 as one skilled in the art would recognize, each numbered encoding mode 410 
is associated with a respective similarly numbered decoding mode 416 
configured to employ the same coding bit rate and coding scheme. 

If the packet disassembler and packet loss detector module 414 detects 
the packet, the packet is disassembled and provided to the pertinent 

15 decoding mode 416. If the packet disassembler and packet loss detector 
module 414 does not detect a packet, a packet loss is declared and the erasure 
■■ decoder 418 advantageously performs frame erasure processing as described 
in a related application filed herewith, entitled FRAME ERASURE 
COMPENSATION METHOD IN A VARIABLE RATE SPEECH CODER, 

20 assigned to the assignee of the present invention, and fully incorporated 
herein by reference. 

The parallel array of decoding modes 416 and the erasure decoder 418 
are coupled to the post filter 420. The pertinent decoding mode 416 decodes, 
or de-quantizes, the packet provides the information to the post filter 420. 

25 The post filter 420 reconstructs, or synthesizes, the speech frame, outputting 
synthesized speech frames, . Exemplary decoding modes and post filters 
are described in detail in the aforementioned U.S. Patent No. 5,414,796 and 
U.S. Application Serial No. 09/217,494. 

In one embodiment the quantized parameters themselves are not 

30 transmitted. Instead, codebook indices specifying addresses in various 
lookup tables (LUTs) (not shown) in the decoder 402 are transmitted. The 
decoder 402 receives the codebook indices and searches the various codebook 
LUTs for appropriate parameter values. Accordingly, codebook indices for 
parameters such as, e.g., pitch lag, adaptive codebook gain, and LSP may be 

35 transmitted, and three associated codebook LUTs are searched by the decoder 
402. 

In accordance with the CELP encoding mode 410, pitch lag, amplitude, 
phase, and LSP parameters are transmitted. The LSP codebook indices are 
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transmitted because the LP residue signal is to be synthesized at the decoder 
402. Additionally, the difference between the pitch lag value for the current 
frame and the pitch lag value for the previous frame is transmitted. 

In accordance with a conventional PPP encoding mode in which the 
5 speech signal is to be synthesized at the decoder, only the pitch lag, 
amplitude, and phase parameters are transmitted. The lower bit rate 
employed by conventional PPP speech coding techniques does not permit 
transmission of both absolute pitch lag information and relative pitch lag 
difference values. 

10 In accordance with one embodiment, highly periodic frames such as 

voiced speech frames are transmitted with a low-bit-rate PPP encoding mode 
410 that quantizes the difference between the pitch lag value for the current 
frame and the pitch lag value for the previous frame for transmission, and 
does not quantize the pitch lag value for the current frame for transmission. 

15 Because voiced frames are highly periodic in nature, transmitting the 
difference value as opposed to the absolute pitch lag value allows a lower 
coding bit rate to be achieved. In one embodiment this quantization is 
generalized such that a weighted sum of the parameter values for previous 
frames is computed, wherein the sum of the weights is one, and the 

20 weighted sum is subtracted from the parameter value for the current frame. 
The difference is then quantized. 

In pne embodiment predictive quantization of IPC parameters is 
performed in accordance with the following description. The LPC 
parameters are converted into line spectral information (LSI) (or LSPs), 

25 which are known to be more suitable for quantization. The N-dimensional 
LSI vector for the M tit frame may be denoted as^ -2?m>- gwv- 1 ^ t j ie 
predictive quantization scheme, the target error vector for quantization is 
computed in accordance with the following equation: 



30 




35 
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The contributions, U , can be equal to the quantized or unquantized 
LSI parameters of the corresponding past frame. Such a scheme is known as 
an auto regressive (AR) method. Alternatively, the contributions, U , can be 
equal to the quantized or unquantized error vector corresponding to the LSI 
5 parameters of the corresponding past frame. Such a scheme is known as a 
moving average (MA) method. 

The target error vector, T f i s then quantized to T using any of various 
known vector quantization (VQ) techniques including, e.g., split VQ or 
multistage VQ. Various VQ techniques are described in A. Gersho & R.M. 
10 Gray, Vector Quantization and Signal Compression (1992). The quantized 
LSI vector is then reconstructed from the quantized target error vector, T / 
using following equation: 

15 

In one embodiment the above-described quantization scheme is 
implemented with P=2, N=10, and 

20 

The above-listed target vector, T, may advantageously be quantized using 

sixteen bits through the well known split VQ method. 

Due to their periodic nature, voiced frames can be coded using a 

scheme in which the entire set of bits is used to quantize one prototype pitch 
25 period, or a finite set of prototype pitch periods, of the frame of a known 

length. This length of the prototype pitch period is called the pitch lag. 

These prototype pitch periods, and possibly the prototype pitch periods of 

adjacent frames, may then be used to reconstruct the entire speech frame 

without loss of perceptual quality. This PPP scheme of extracting the 
30 prototype pitch period from a frame of speech and using these prototypes for 

reconstructing the entire frame is described in the aforementioned U.S. 

Application Serial No. 09/217,494. 

In one embodiment a quantizer 500 is used to quantize highly periodic 

frames such as voiced frames in accordance with a PPP coding scheme, as 
35 shown in FIG. 8. The quantizer 500 includes a prototype extractor 502, a 

frequency domain converter 504, an amplitude quantizer 506, and a phase 
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quantizer 508. The prototype extractor 502 is coupled to the frequency 
domain converter 504. The frequency domain converter 504 is coupled to 
the amplitude quantizer 506 and to the phase quantizer 508. 

The prototype extractor 502 extracts a pitch period prototype from a 
5 frame of speech, s(n). In an alternate embodiment, the frame is a frame of 
LP residue. The prototype extractor 502 provides the pitch period prototype 
to the frequency domain converter 504. The frequency domain converter 
504 transforms the prototype from a time-domain representation to a 
frequency-domain representation in accordance with any of various known 

10 methods including, e.g., discrete Fourier transform (DFT) or fast Fourier 
transform (FFT). The frequency domain converter 504 generates an 
amplitude vector and a phase vector. The amplitude vector is provided to 
the amplitude quantizer 506, and the phase vector is provided to the phase 
quantizer 508. The amplitude quantizer 506 quantizes the set of amplitudes, 

15 generating a quantized amplitude vector, A , and the phase quantizer 508 
quantizes the set of phases, generating a quantized phase vector, * . 

Other schemes for coding voiced frames, such as, e.g., multiband 
excitation (MBE) speech coding and harmonic coding, transform the entire 
frame (either LP residue or speech) or parts thereof into frequency-domain 

20 values through Fourier transform representations comprising amplitudes 
and phases that can be quantized and used for synthesis into speech at the 
decoder (not shown). To use the quantizer of FIG. 8 with such coding 
schemes, the prototype extractor 502 is omitted, and the frequency domain 
converter 504 serves to decompose the complex short-term frequency 

25 spectral representations of the frame into an amplitude vector and a phase 
vector. And in either coding scheme, a suitable windowing function such 
as, e.g., a Hamming window, may first be applied. An exemplary MBE 
speech coding scheme is described in D.W. Griffin & J:S. Lim, "Multiband 
Excitation Vocoder/' 36(8) FEE Trans, on ASSP (Aug. 1988). An exemplary 

30 harmonic speech coding scheme is described in L.B. Almeida & J.M. Tribolet, 
"Harmonic Coding: A Low Bit-Rate, Good Quality, Speech Coding 
Technique/' Proc. ICASSP '82 1664-1667 (1982). 

Certain parameters must be quantized for any of the above voiced 
frame coding schemes. These parameters are the pitch lag or the pitch 

35 frequency, and the prototype pitch period waveform of pitch lag length, or 
the short-term spectral representations (e.g., Fourier representations) of the 
entire frame or a piece thereof. 
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In one embodiment predictive quantization of the pitch lag or the 
pitch frequency is performed in accordance with the following description. 
The pitch frequency and the pitch lag can be uniquely obtained from one 
another by scaling the reciprocal of the other with a fixed scale factor. 
5 Consequently, it is possible to quantize either of these values using the 
following method- The pitch lag (or the pitch frequency) for the frame 'm' 
may be denoted 4» . The pitch lag, L m / can be quantized to a quantized value, 

L m , according to the following equation: 

10 4 = $L m +v m L mi +17^4,, , 

in which the values L m^ L m l '"^m N are the pitch lags (or the pitch frequencies) 
for frames m l9 m 2 ,... 9 m N / respectively, the values 7 lm l >Vm 2 >-~> 7 lm N are 
corresponding weights, and $ L m is obtained from the following equation 

15 

and quantized using any of various known scalar or vector quantization 
techniques. In a particular embodiment, a low-bit-rate, voiced speech coding 

20 scheme was implemented that quantizes $ L m ~ L m ~ L m-\ using only four bits. 

In one embodiment quantization of the prototype pitch period or the 
short-term spectrum of the entire frame or parts thereof is performed in 
accordance with the following description. As discussed above, fhe 
prototype pitch period of a voiced frame can be quantized effectively (in 

25 either the speech domain or the LP residual domain) by first transforming 
the time-domain waveform into the frequency domain where the signal can 
be represented as a vector of amplitudes and phases. All or some elements 
of the amplitude and phase vectors can then be quantized separately using a 
combination of the methods described below. Also as mentioned above, in 

30 other schemes such as MBE or harmonic coding schemes, the complex short- 
term frequency spectral representations of the frame can be decomposed into 
amplitudes and phase vectors. Therefore, the following quantization 
methods, or suitable interpretations of them, can be applied to any of the 
above-described coding techniques. 

35 In one embodiment amplitude values may be quantized as follows. 

The amplitude spectrum may be a fixed-dimension vector or a variable- 
dimension vector. Further, the amplitude spectrum can be represented as a 
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combination of a lower dimensional power vector and a normalized 
amplitude spectrum vector obtained by normalizing the original amplitude 
spectrum with the power vector. The following method can be applied to 
any, or parts thereof, of the above-mentioned elements (namely, the 
5 amplitude spectrum, the power spectrum, or the normalized amplitude 
spectrum). A subset of the amplitude (or power, or normalized amplitude) 
vector for frame 'ra' may be denoted A m . The amplitude (or power, or 
normalized amplitude) prediction error vector is first computed using the 
following equation: 

10 

5A m = A m -^A Mi -a^A ma -...-^A OTjv , 

in which the values ^^A^-.A^ ^ the subset of the amplitude (or 
power, or normalized amplitude) vector for frames m\>m 29 .„ 9 m N f respectively, 

i T c T f T 

15 and the values a i7 Jl > a m 2 >— > a OT „ are the transposes of corresponding weight 
vectors. 

The prediction error vector can then be quantized using any of 
various known VQ methods to a quantized error vector denoted . The 
quantized version of A m is then given by the following equation: 

20 

The weights a establish the amount of prediction in the quantization 
scheme. In a particular embodiment, the above-described predictive scheme 

25 has been implemented to quantize a two-dimensional power vector using 
six bits, and to quantize a nineteen-dimensional, normalized amplitude 
vector using twelve bits. In this manner, it is possible to quantize the 
amplitude spectrum of a prototype pitch period using a total of eighteen bits. 
In one embodiment phase values may be quantized as follows. A 

30 subset of the phase vector for frame 'ra' may be denoted 6 « . It is possible to 
quantize 6 " as being equal to the phase of a reference waveform (time 
domain or frequency domain of the entire frame or a part thereof), and zero 
or more linear shifts applied to one or more bands of the transformation of 
the reference waveform. Such a quantization technique is described in U.S. 

35 Application Serial No. 09/365,491, entitled METHOD AND APPARATUS 
FOR SUBSAMPLDMG PHASE SPECTRUM INFORMATION, filed July 19, 
1999, assigned to the assignee of the present invention, and fully 
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incorporated herein by reference. Such a reference waveform could be a 
transformation of the waveform of frame m*? , or any other predetermined 

waveform. 

For example, in one embodiment employing a low-bit-rate, voiced 
5 speech coding scheme, the LP residue of frame 'm-V is first extended 
according to a pre-established pitch contour (as has been incorporated into 
the Telecommunication Industry Association Interim Standard TIA/EIA IS- 
127), into the frame 'm/ Then a prototype pitch period is extracted from the 
extended waveform in a manner similar to the extraction of the 
10 unquantized protoype of the frame 'ra'. The phases, , of the extracted 

prototype are then obtained. The following values are then equated: 
6 m =:6 m -i . In this manner it is possible to quantize the phases of the 

prototype of the frame 'm' by predicting from the phases of a transformation 
of the waveform of frame 'm-V using no bits. 

15 In a particular embodiment, the above-described predictive 

quantization schemes have been implemented to code the LPC parameters 
and the LP residue of a voiced speech frame using only thirty-eight bits. 

Thus, a novel and improved method and apparatus for predictively 
quantizing voiced speech have been described. Those of skill in the art 

20 would understand that the data, instructions, commands, information, 
signals, bits, symbols, and chips that may be referenced throughout the above 
description are advantageously represented by voltages, currents, 
electromagnetic waves, magnetic fields or particles, optical fields or particles, 
or any combination thereof. Those of skill would further appreciate that the 

25 various illustrative logical blocks, modules, circuits, and algorithm steps 
described in connection with the embodiments disclosed herein may be 
implemented as electronic hardware, computer software, or combinations of 
both. The various illustrative components, blocks, modules, circuits, and 
steps have been described generally in terms of their functionality. Whether 

30 the functionality is implemented as hardware or software depends upon the 
particular application and design constraints imposed on the overall system. 
Skilled artisans recognize the interchangeability of hardware and software 
tinder these circumstances, and how best to implement the described 
functionality for each particular application. As examples, the various 

35 illustrative logical blocks, modules, circuits, and algorithm steps described in 
connection with the embodiments disclosed herein may be implemented or 
performed with a digital signal processor (DSP), an application specific 
integrated circuit (ASIC), a field programmable gate array (FPGA) or other 
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programmable logic device, discrete gate or transistor logic, discrete 
hardware components such as, e.g., registers and FIFO, a processor executing 
a set of firmware instructions, any conventional programmable software 
module and a processor, or any combination thereof designed to perform the 
5 functions described herein. The processor may advantageously be a 
microprocessor, but in the alternative, the processor may be any 
conventional processor, controller; microcontroller, or state machine. The 
software module could reside in RAM memory, flash memory, ROM 
memory, EPROM memory, EEPROM memory, registers, hard disk, a 

10 removable disk, a CD-ROM, or any other form of storage medium known in 
the art. As illustrated in FIG. 8, an exemplary processor 600 is 
advantageously coupled to a storage medium 602 so as to read information 
from, and write information to, the storage medium 602. In the alternative, 
the storage medium 602 may be integral to the processor 600. The processor 

15 600 and the storage medium 602 may reside in an ASIC (not shown). The 
ASIC may reside in a telephone (not shown). In the alternative, the 
processor 600 and the storage medium 602 may reside in a telephone. The 
processor 600 may be implemented as a combination of a DSP and a 
microprocessor, or as two microprocessors in conjunction with a DSP core, 

20 etc. 

Preferred embodiments of the present invention have thus been 
shown and described. It would be apparent to one of ordinary skill in the art, 
however, that numerous alterations may be made to the embodiments 
herein disclosed without departing from the spirit or scope of the invention. 
25 Therefore, the present invention is not to be limited except in accordance 
with the following claims. 



We claim: 
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CLAIMS 

1- A method of quantizing information about a parameter of 
2 speech, comprising: 

generating at least one weighted value of the parameter for at 
4 least one previously processed frame of speech, wherein the sum of all 
weights used is one; 

6 subtracting the at least one weighted value from a value of the 

parameter for a currently processed frame of speech to yield a difference 
8 value; and 

quantizing the difference value. 

2. The method of claim 1, wherein the at least one weighted value 
2 comprises one value of the parameter for the immediately previously 

processed frame of speech, the one value having a weight equal to one. 

3. The method of claim 1, wherein the speech is voiced speech. 

4. The method of claim 1, wherein the parameter is a pitch lag 

2 value. 

5. The method of claim 1, wherein the parameter is an amplitude 

2 value. 

6. The method of claim 1, further comprising computing the 
2 value of the parameter for the currently processed frame of speech. 

7. The method of claim 6, wherein the computing comprises 
2 extracting a pitch period prototype from the currently processed frame of 

speech and obtaining a frequency-domain representation of the pitch period 
4 prototype. 



8. The method of claim 6, wherein the computing comprises 
2 calculating a short-term frequency-domain representation of the currently 
processed frame of speech. 



WO 01/82293 



PCT7US01/12988 



25 

9. The method of claim 8, further comprising decomposing the 
2 short-term frequency-domain representation into an amplitude vector and a 

phase vector. 

10. A speech coder configured to quantize information about a 
2 parameter of speech, comprising: 

means for generating at least one weighted value of the 
4 parameter for at least one previously processed frame of speech, wherein the 

sum of all weights used is one; 
6 means for subtracting the at least one weighted value from a 

value of the parameter for a currently processed frame of speech to yield a 
8 difference value; and 

means for quantizing the difference value. 

11. An infrastructure element configured to quantize information 
2 about a parameter of speech, comprising: 

a parameter generator configured to generate at least one 
4 weighted value of the parameter for at least one previously processed frame 

of speech, wherein the sum of all weights used is one; and 
6 a quantizer coupled to the parameter generator and configured 

to subtract the at least one weighted value from a value of the parametet for 
8 a currently processed frame of speech to yield a difference value, and to 

quantize the difference value. 

12. The infrastructure element of claim 11, wherein the at least one 
2 weighted value comprises one value of the parameter for the immediately 

previously processed frame of speech, the one value having a weight equal 
4 to one. 

13. The infrastructure element of claim 11, wherein the speech is 
2 voiced speech- 

14. The infrastructure element of claim 11, wherein the parameter 
2 is a pitch lag value. 

15. The infrastructure element of claim 11, wherein the parameter 
2 is an amplitude value. 



WO 01/82293 



PCTYUS01/12988 



26 

4 16. The infrastructure element of claim 11, wherein the parameter 

generator is further configured to compute the value of the parameter for 
6 the currently processed frame of speech. 

17. The infrastructure element of claim 16, wherein the parameter 
2 generator is further configured to extract a pitch period prototype from the 

currently processed frame of speech and obtain a frequency-domain 
4 representation of the pitch period prototype. 

18. The infrastructure element of claim 16, wherein the parameter 
2 generator is further configured to calculate a short-term frequency-domain 

representation of the currently processed frame of speech. 

19. The infrastructure element of claim 18, wherein the parameter 
2 generator is further configured to decompose the short-term frequency- 
domain representation into an amplitude vector and a phase vector. 

20. A subscriber unit configured to quantize information about a 
2 parameter of speech, comprising: 

a processor; and 

4 a storage medium coupled to the processor and containing a set 

of instructions executable by the processor to generate at least one weighted 

6 value of the parameter for at least one previously processed frame of speech, 
wherein the sum of all weights used is one, and subtract the at least one 

8 weighted value from a value of the parameter for a currently processed 
frame of speech to yield a difference value, and to quantize the difference 
10 value. 

21. The subscriber unit of claim 20, wherein the at least one 
2 weighted value comprises one value of the parameter for the immediately 

previously processed frame of speech, the one value having a weight equal 
4 to one. 

22. The subscriber unit of claim 20, wherein the speech is voiced 
2 speech. 

23. The subscriber unit of claim 20, wherein the parameter is a 
2 pitch lag value. 
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24. The subscriber unit of claim 20, wherein the parameter is an 
2 amplitude value. 

25. The subscriber unit of claim 20, wherein the set of instructions 
2 is further executable by the processor to compute the value of the parameter 

for the currently processed frame of speech. 

26. The subscriber unit of claim 25, wherein the set of instructions 
2 is further executable by the processor to extract a pitch period prototype from 

the currently processed frame of speech and obtain a frequency-domain 
4 representation of the pitch period prototype. 

27. The subscriber unit of claim 25, wherein the set of instructions 
2 is further executable by the processor to calculate a short-term frequency- 
domain representation of the currently processed frame of speech. 

28. The subscriber unit of claim 27, ^vherein the set of instructions 
2 is further executable by. the processor to decompose the short-term 

frequency-domain representation into an amplitude vector and a phase 
4 vector. 

29. A method of quantizing information about a phase parameter 
2 of speech, comprising: 

generating at least one modified value of the phase parameter 
4 for at least one previously processed frame of speech; 

applying a number of phase shifts to the at least one modified 
6 value, the number of phase shifts being greater than or equal to zero; 

subtracting the at least one modified value from a value of the 
8 phase parameter for a currently processed frame of speech to yield a 
difference value; and 
10 quantizing the difference value. 

30. A speech coder configured to quantize information about a 
2 phase parameter of speech, comprising: 

means for generating at least one modified value of the phase 
4 parameter for at least one previously processed frame of speech; 
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means for applying a number of phase shifts to the at least one 
6 modified value, the number of phase shifts being greater than or equal to 
zero; 

8 means for subtracting the at least one modified value from a 

value of the phase parameter for a currently processed frame of speech to 
10 yield a difference value; and 

means for quantizing the difference value. 

31. A subscriber unit configured to quantize information about a 
2 phase parameter of speech, comprising: 
a processor; and 

4 a storage medium coupled to the processor and containing a set 

of instructions executable by the processor to generate at least one modified 
6 value of the phase parameter for at least one previously processed frame of 

speech, apply a number of phase shifts to the at least one modified value, the 
8 number of phase shifts being greater than or equal to zero, subtract the at 

least one modified value from a value of the parameter for a currently 
10 processed frame of speech to yield a difference value, and to quantize the 

difference value. 



WO 01/82293 



PCT/US01/12988 




WO 01/82293 



PCT/USO 1/12988 



2/7 



CO 



o 



i 








to 




O 


e 






Li 


J 




Q 






<M 
O 



N 



H 



00 

o 



CM 
O 





CO 



I 

CO 



WO 01/82293 



PCT/US01/12988 




WO 01/82293 



PCTYUS01/12988 



4/7 



O 
O 

CO 



00 

o 

CP 



CD 




CO 




LLI 


DC 


05 


FIL 


& 











(3 




WO 01/82293 



PCT/US01/12988 



5/7 



O 



CM 





CD 






E 


a 






Z> 


PACI 


OPMA 


MOD 




u_ 





7 



O 
O 



T 



T 



o * 


p 


o « 


» 










T 






t 


CD 






CD 


eg 




a 




Z 






Z 
Q 




z 




Q 


LLi 




LLI 




Q 


LU 


Q 






Q 


• • • 


Q 


8 


o 






O 






O 


Z 
LLI 






1 






1 





00 

o 



LU 



r 



o 

Q O Z) 
Q LT Q 

o 



CD 
O 



cc 

I 

§ 



2 




o 






_i 

LLI 


NIC 


Z 
Z 




< 


MM 


X 

o 


CO 






WO 01/82293 



PCT/US01/12988 




LU 
Q 



WO 01/82293 



PCTVUS01/12988 



7/7 





CM 
O 



o 
o 

LO 




oo 
CD 





TOR 






I 












Q_ 





T 



INTERNATIONAL SEARCH REPORT 



ational Application No 

FCT/US 01/12988 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC 7 G10L19/08 G10L19/02 



According to International Patent Classification (IPC) or to both national classification and IPC 



B. FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 

IPC 7 G10L 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practical, search terms used) 

EPO-Internal , WPI Data, PAJ, INSPEC 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category ° Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



p,x 



WO 01 06495 A (QUALCOMM INC) 
25 January 2001 (2001-01-25) 



page 13, line 14 - line 27 

EP 0 696 026 A (NIPPON ELECTRIC CO) 
7 February 1996 (1996-02-07) 



page 6, line 46 - line 58 

page 13, line 48 -page 14, line 10 

page 15, line 14 - line 20 

-/-- 



1,2,6, 
10-12, 
16,20, 
21,25 



1,3,4,6, 

10,11, 

13,14, 

16,20, 

22,23,25 



Further documents are listed in the continuation of box C. 



ID 



Patent family members are listed In annex. 



° Special categories of cited documents : 

'A' document defining the general state of the art which Is not 

considered to be of particular relevance 
■E" earlier document but published on or after the International 

filing date 

"L" document which may throw doubts on priority claim(s) or 
which is cited to establish the publication date of another 
citation or other special reason (as specified) 

"O* document referring to an oral disclosure, use, exhibition or 
other means 

"P" document published prior to the international filing date but 
later than the priority date claimed 



T later document published after the international filing date 
or priority date and not In conflict with the application but 
cited to understand the principle or theory underlying the 
invention 

"X" document of particular relevance; the claimed invention 
cannot be considered novel or cannot be considered to 
involve an Inventive step when the document Is taken alone 

•V document of particular relevance; the claimed invention 

cannot be considered to involve an inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art. 

document member of the same patent family 



Date of the actual completion of the International search 



17 September 2001 



Date of mailing of the international search report 

24/09/2001 



Name and mailing address of the ISA 

European Patent Office, P.B. 5818 Patentlaan 2 
NL - 2280 HV Rijswijk 
Tel. (+31-70) 340-2040, Tx. 31 651 epo nl. 
Fax: (+31-70) 340-3016 



Authorized officer 



Ramos Sanchez, U 



Form PCT/1SA/210 (second sheet) (July 1992) 



INTERNATIONAL SEARCH REPORT 



itional Application No 

PTT/US 01/12988 



C.(Contlnuation) DOCUMENTS CONSIDERED TO BE RELEVANT 



Category ° Citation of document, with Indication, where appropriate, of the relevant passages 



Relevant to claim No. 



CHO INHWAN ET AL: "Predictive pyramid 
vector quantisation of LSF parameters" 
ELECTRONICS LETTERS, IEE STEVENAGE, GB, 
vol. 34, no. 8, 

16 April 1998 (1998-04-16), pages 735-736, 
XP006009612 
ISSN: 0013-5194 
the whole document 

EP 0 987 680 A (BRITISH TELEC0MM) 
22 March 2000 (2000-03-22) 

page 6, line 6 - line 17 
page 6, line 39 - line 58 

US 5 884 253 A (KLEIJN WILLEM BASTIAAN) 
16 March 1999 (1999-03-16) 

column 9, line 6 - line 23 

EP 0 926 660 A (TOKYO SHIBAURA ELECTRIC 
CO) 30 June 1999 (1999-06-30) 
column 7, line 47 - line 53 

WO 00 11659 A (CONEXANT SYSTEMS INC) 
2 March 2000 (2000-03-02) 
page 64 



1-4,6, 
10-14, 
16, 

20-23,25 



7-9, 

17-19, 

26-31 



7-9, 

17-19, 

26-31 



5,15,24 



5,15,24 



Form PCT/1SA/21 0 (continuation of second sheet) (Jury 1992) 



INTERNATIONAL SEARCH REPORT 

Information on patent family members 



itlonal Application No 

PTT/US 01/12988 



Patent document 




Publication 




Patent family 




Publication 


cited in search report 




date 




member(s) 




date 


wu uiuo^yb 


A 

A 


25-01-2001 


AU 


6354600 


A 


ftp* ftft ft ft ft 1 

05-02-2001 








WO 


ri -i ft^ A ft C" 

0106495 


A 1 

Al 


ft t* ft 1 ftft ft -1 
25-01-2001 


CD ncncnof 


A 

A 


ft —7 ftft 1 ft ft /~ 

07-02-1996 


JP 


*ii r* ft ft ^r* 

3153075 


r» ft 
B2 


ftft ftA ft ^ ft "1 

03-04-2001 








JP 


ft ft a a oft ft 

8044398 


A 

A 


16-02-1996 








JP 


0/1 ft *7ft 1 ft 

2907019 


ft ft 
B2 


ft *i r\c "iftftft 

21-06-1999 








JP 


ft ft "7^ ft ft ft 

8076800 


A 

A 


22-03-1996 








JP 


1 ft ft ft F" ft 1 

3003531 


B2 


31-01-2000 








JP 


f> 1 ft r* 1 ftft 

8185199 


A 


16-07-1996 








CA 


2154911 


A *t 

Al 


ftft ftft -» »-\ ^* 
03-02-1996 








EP 


1093115 


A2 


18-04-2001 








EP 


1093116 


Al 


18-04-2001 








EP 


0696026 


A2 


07-02-1996 








US 


5778334 


A 


07-07-1998 



EP 0987680 A 22-03-2000 EP 0987680 Al 22-03-2000 



US 


5884253 


A 


16- 


-03- 


-1999 


NONE 








EP 


0926660 


A 


30- 


-06- 


-1999 


EP 


0926660 A2 


30-06-1999 














JP 


11259098 A 


24-09-1999 


WO 


0011659 


A 


02- 


-03- 


-2000 


US 


6260010 


Bl 


10-07-2001 














EP 


1105871 


Al 


13-06-2001 














EP 


1105872 


Al 


13-06-2001 














EP 


1105870 


Al 


13-06-2001 














EP 


1110209 


Al 


27-06-2001 














WO 


0011658 


Al 


02-03-2000 














WO 


0011652 


Al 


02-03-2000 














WO 


0011655 


Al 


02-03-2000 














WO 


0011659 


Al 


02-03-2000 














wo 


0011648 


Al 


02-03-2000 














wo 


0011653 


Al 


02-03-2000 














wo 


0011649 


Al 


02-03-2000 














wo 


0011656 


Al 


02-03-2000 














wo 


0011660 


Al 


02-03-2000 














wo 


0011650 


Al 


02-03-2000 














wo 


0011661 


Al 


02-03-2000 














wo 


0011657 


Al 


02-03-2000 














wo 


0011651 


Al 


02-03-2000 














wo 


0011654 


Al 


02-03-2000 














US 


6188980 


Bl 


13-02-2001 














us 


6104992 


A 


15-08-2000 














us 


6173257 


Bl 


09-01-2001 














us 


6240386 


Bl 


29-05-2001 



Form PCT/1SA/210 (patent family annex) (July 1992) 



